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The capacity to resist attacks from the environment is crucial to the survival of all organisms. 
We quantitatively analyze the susceptibility of protein interaction networks of numerous organisms 
to random and malicious attacks. We find for all organisms studied that random rewiring improves 
protein network robustness, so that actual networks are more fragile than rewired surrogates. This 
unexpected fragility contrasts with the behavior of networks such as the Internet, whose robustness 
decreases with random rewiring. We trace this surprising effect to the modular structure of protein 
networks. 



INTRODUCTION 

Over the past two decades, prodigiously detailed maps 
of protein interaction networks (PPI) have been produced 
PJ |2] . These networks in principle present a record of all 
metabolic processes and their inter-relations, but in prac- 
tice the number of chemical actors and the complexity of 
their interactions make the networks difficult to decipher 
In this letter, we show that notwithstanding their 
apparent complexity, it is possible to establish common 
features of protein networks starting from a few simple 
principles [1HZ]- 

We begin with the observation, illustrated in Fig. [T] 
that biological protein networks involve both common 
processes that all cells must use (e.g. enzymes involved 
in the Krebs cycle, marked with red labels in Fig. 1) and 
what are termed modular processes [SHIP] that appear 
only in special situations (e.g. guidance molecules used 
only during particular circumstances, as development, re- 
production, or response to heat stress, indicated by blue 
labels in Fig. 1). As we will show, this modular or- 
ganization produces common, and predictable, network 
properties shared by all organisms studied. We focus 
in particular on the fragility of biological networks - a 
property of manifest importance for survival - to attacks 
by interruption of individual protein function. To this 
end, we evaluate the extent to which protein interaction 
networks of 20 different organisms ranging from bacteria 
and plants to homo sapiens (Table |l| can be disrupted 
by either random or targeted attacks. 
As we have remarked, the modular construction shown 
in Fig. [T] consists of a highly interconnected core of pro- 
teins, accompanied by satellite clusters with "hub" pro- 
teins weakly connected to the core. As a consequence, 
three predictions can readily be made. First, this type 
of network can be expected to be vulnerable to attacks 
that interrupt the few hub proteins, but should be com- 
paratively robust against attacks that interrupt any of 
the more numerous proteins attached to 'spokes' of these 




Figure 1. (color online) protein network for C. Elegans [24] . 
Modular proteins identified include F09C12.7, an element of 
major sperm protein, K08B4.1a involved in embryonic devel- 
opment and notch, and F26D10.3.2, which is a heat shock pro- 
tein. On the other hand, the proteins identified in the central 
complex are essential to the Krebs cycle: F22D6.4.2 encodes a 
subunit of NADH dehydrogenase, E04A4.7.4 is better known 
as cytochrome c 2.1, and C34E10.6.3 is ATP synthase. 



hubs |9 a . Thus random attacks are unlikely to signif- 
icantly interrupt function, while malicious attacks di- 
rected against one or more hub proteins are likely to dis- 
rupt the network. Second, through countless generations 
of attacks, we expect evolution to have tuned biological 
networks to be more robust against attacks than statisti- 
cally comparable, but non-biological, networks. Third, 
through the same reasoning we expect biological net- 
works to be optimal in that alternative interconnections 
should worsen robustness. As we will show, these pre- 
dictions are largely correct, but admit unexpected and 
revealing failures. 
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RESULTS AND DISCUSSION 
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Figure 2. Robustness of the 20 protein networks from Table 
I against random and malicious attacks. Notice that for ev- 
ery organism studied, the robustness against random attacks 
is smaller than surrogates with identical degree distributions, 
while the robustness against malicious attacks is larger than 
such surrogates. Solid and open symbols correspond, respec- 
tively, to biological data and surrogates. Error bars, defining 
standard deviations over 20 randomized surrogate trials, are 
smaller than the symbol sizes. 



Organism 


ID 


N 


M 


(k) 


Drosophila melanogaster - AA 


f 


3960 


44409 


22.4 


Gallus gallus - AC 


2 


3723 


54131 


29.1 


Homo sapiens - AC 


3 


12299 


176316 


28.7 


Mus musculus - AC 


4 


9595 


123665 


25.8 


Xenopus tropicalis - AC 


5 


1870 


7374 


7.9 


Caenorhabditis elegans - AN 


6 


2113 


14261 


13.5 


Aspergillus fumigatus - FA 


7 


2364 


29288 


24.8 


Saccharomyces cerevisae - FA 


8 


5209 


66057 


25.4 


Schizosaccharomyces pombe - FA 


9 


2458 


28822 


23.5 


Arabidopsis thaliana - PM 


10 


4205 


81957 


40.0 


Rhodococcus sp - AB 


11 


5540 


57992 


20.9 


Saccharopolyspora erythraea - AB 


12 


3715 


24691 


13.3 


Aeromonas hydrophila - PB 


13 


2765 


13849 


10.0 


Bradyrhizobium japonicum - PB 


14 


4948 


29628 


12.0 


Citrobacter koseri - PB 


15 


3477 


17288 


9.9 


Escherichia coli - PB 


16 


3542 


25197 


14.2 


Nocardia farcinica - PB 


17 


3277 


21359 


13.0 


Pseudomonas aeruginosa - PB 


18 


3709 


20401 


11.0 


Serratia proteamaculans - PB 


19 


3392 


16978 


10.0 


Vibrio cholerae - PB 


20 


2506 


12899 


10.3 



TABLE I. List of organisms investigated. Acronyms in col- 
umn 1 indicate kingdom and phylum the organisms belong 
to: AA - Animalia Arthropoda; AB - Actinum Bacteria; AC 
- Animalia Chordata; AN - Animalia Nematoda; FA - Funghi 
Ascomycota; PB - Bacteria Proteo; PM - Plantae Magnolio- 
phyta. The ID (second column) is used to identify organisms 
in Figs. [2] and [4] Columns 3, 4 and 5 define the numbers of 
nodes in the largest cluster N, total numbers of edges M, and 
average degree (k). Shadings correspond to Fig. 2. 



We examine protein networks of 20 different organisms 
in the bacteria and eukarya domains, identified in Table 
1 along with numbers of nodes N (i.e. proteins) , edges M 
(connections between proteins), and the average degree 
(k) (number of connections per protein) of the largest 
connected component of each network. Our measure of 
robustness is essentially unaffected by the small number 
of isolated nodes that are detached from the largest clus- 
ter, so we neglect these in our analysis. We used the 
STRING 8.2 "Combined Score" (CS) [IS], a measure of 
the likelihood that two proteins interact in a given net- 
work, to impose the criterion that edges ei.j are included 
in the network only if CS^- is over a threshold value, 
CSf/j = 70%. Smaller values of CS t ^ produce dramatic 
growth in numbers of edges, masking relevant informa- 
tion with extraneous information, while larger CSth ex- 
cludes known protein interactions |19j . 
Typical results are presented in Figs(2|4j showing the de- 
pendence of robustness on random and malicious attacks 
for several network types. As one would expect, for all 
networks the tolerance to random attacks is high (Fig. 
2, red data) and to malicious attacks is low (Fig. 2, 
blue data). However unexpectedly we find that all bio- 
logical networks studied have a significantly lower resis- 
tance to random attacks, and significantly higher resis- 
tance to malicious attacks than do surrogates, random- 
ized Tm = 10 s times, as described previously. This para- 
doxical behavior is surprising, and can be analyzed in 
further detail as shown in Fig. 3. In that figure, we 
plot detailed responses to systematic randomization, us- 
ing C. Elegans as an exemplar, compared with several 
non-biological networks. 

For all networks in Fig. 3(a), we find that small amounts 
of random rewiring improve network robustness to ran- 
dom attacks; for biological and other modular networks 
(for example airlines, shown as triangles in the plot), the 
improvement is much larger than for less obviously mod- 
ular networks such as citations or access points ("points- 
of-presence") to the Internet. By contrast, the behav- 
ior of a second class of highly redundant networks, for 
example the entire Internet or corporate ownership net- 
works, is shown in the insets to Fig. 3. These networks 
are nearly optimally robust, since switching connections 
tends to reduce network robustness. 
Thus a first and unexpected finding of this analysis so 
far is that as shown in Fig. 3(a), biological proteins and 
other modular networks are less than ideally organized 
from the point of view of robustness against random at- 
tacks, insofar as this robustness can be significantly im- 
proved by any amount of rewiring. A second unexpected 
finding, shown in Fig. 3(b), is that although biological 
protein networks are more than twice as robust against 
malicious attacks as any other network tested, modest 
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Figure 3. Fundamentally different behaviors of fragile and 
robust networks. Robustness against random attacks, -Rra, 
increases with an increasing fraction C/M of changed edges 
for C. Elegans (filled circles) and other networks such as air- 
line (triangles) [25] , citation(stars)[26] and point-of-presence 
networks (open circles) [27], while by contrast network robust- 
ness decreases with C/M for the Internet (squares) [20] and 
corporate ownership network (diamonds) 22 . Note that the 
improvement in robustness against random attacks is signif- 
icantly larger for C. Elegans and airline networks, both of 
which are modular, and is opposite to that of the Internet 
(inset). Likewise the robustness against malicious attacks, 
i?HDA j differs between biological and other networks. J?hda 
increases with C/M up to 12% until C/M w 1, after which 
-Rhda decreases for biological networks, in contrast with all 
other networks except for the ownership network, for which 
-Rhda monotonically increases with C/M. For better visibil- 
ity some data are shown in the insets having abscissas using 
the same axes as the main plot; curve fits are included to aid 
the eye. 



modifications of the protein interaction structure can im- 
prove the network robustness from 2% for H. Sapiens and 
12% for C. Elegans (Fig. 3(b)) up to 28% for G. Gallus. 
Apparently, despite the manifest two-fold improvement 
in robustness shown in Fig. 3(b) that evolution has pro- 
duced, life remains among a class of networks that are 
more fragile to either random or malicious attacks than 
slightly modified surrogates. 

This effect, which holds for all of the 20 organisms stud- 
ied, differs markedly from a second class of networks, 
shown in the insets to Fig's 3(a)-(b), that is exemplified 
by the Internet [20] , which was designed for maximal ro- 
bustness against errors [3T] , and to a lesser extent corpo- 
rate ownership networks, that are robust by virtue of sim- 
ilarly numerous inter-relations |22j . Our findings there- 
fore indicate that although the Internet and PPI net- 
works share broad degree distributions, the two types of 
networks behave fundamentally differently in their over- 
all fragility as measured by comparison with modified 
surrogates. 

To investigate the consistency of these results, we re- 
peated our analyses under various modifications. First, 
we evaluated the reliability of the data itself by consid- 
ering both a higher value of the threshold likelihood of 
protein interactions, CSth = 80%, as well as data from a 
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Figure 4. Effects of modularity on robustness in model 
(curves) and sample biological networks (data points, for C. 
Elegans). Insets show simple network (left) and more complex 
network (right) fit to C. Elegans data. Network robustnesses 
are shown as dashed and solid lines respectively. 



different version of STRING 8.1 [33]. Second, we consid- 
ered whether the robustness could be an indirect effect 
of a change in correlation between nodes - for example 
as high degree nodes are swapped with low degree ones. 
For this purpose, we modified the rewiring to preserve 
correlations by performing swaps between pairs of nodes 
{(i, j), (k,£)} — > only if the degrees of i and 

k or j and I are equal. Third, we considered the effect of 
randomly removing individual edges described again by 
Eq. [2j but with N defined to be the number or edges, 
rather than nodes. In each of these independent tests, 
we found the same features commented on for Fig's [2] 
and|3j supporting the two key results that biological net- 
works exhibit more fragility to random errors than simi- 
lar non-biological networks, and that although biological 
networks are more than twice as robust against malicious 
attacks as non-biological networks, they remain less than 
optimal. 

We have reported three curious and previously unex- 
plored properties of protein-protein interaction (PPI) 
networks. (1) PPI networks are less robust against ran- 
dom attacks than surrogates with identical numbers of 
nodes, edges, and degree distributions; (2) PPI net- 
works are more robust against malicious attacks than 
surrogates; and (3) despite millenia of evolutionary pres- 
sure, PPI network robustness against malicious attacks 
remains suboptimal and can be improved by modest 
rewiring. 

To analyze the causes of these unexpected behaviors, 
we return to the observation made earlier that PPI net- 
works are intrinsically modular. Since modules have 
many fewer nodes than the central network, it follows 
that any switch involving a node in a module is highly 
likely to involve a second connection that is outside of 
the module. Such a switch will produce two new edges, 
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both of which will connect the module to the central net- 
work, so switching connections will typically increase the 
number of connections from a module to the central net- 
work. This in turn will improve the robustness of that 
module to cither random or malicious attack, since such 
a switch would increase the number of connections that 
would have to be broken between the module and the 
rest of the network. 

We can test this mechanism by constructing model net- 
works with suitable properties, two of which are shown in 
Fig. 4. In that figure, we compare both simple and more 
finely tuned model networks with the biological data that 
appear as solid symbols in Fig. 3. The simple model (left 
inset) is constructed by creating a central large complex 
with broad degree distribution. An arbitrary small num- 
ber (eight, here) of modules, each with different number 
of nodes but the same number of random connections, are 
added and are attached to random nodes in the central 
complex by two connections. The robustness in response 
to random and malicious attacks is then evaluated ex- 
actly as before, and is plotted in Fig. 4 as dashed curves. 
By contrast, non-modular model networks that we have 
constructed (not shown) have few vital hubs and so ex- 
hibit identical responses to either random or malicious 
attacks, with no dependence on C/M. 

Evidently, the qualitative behavior of biological re- 
sponses to random as well as malicious attacks can be 
attributed to the modular structure of biological protein 
networks. Indeed, it is not difficult to tune the model 
network to nearly exactly fit the biological data. This is 
shown in the right inset of Fig. 4, where we display a 
fictitious network whose random and malicious response 
curves are shown as solid lines in the main plot. This 
network is constructed by choosing the number of con- 
nections of the model to be similar to the biological one. 
In detail, the nodes are distributed in 20 modules with 
different densities, in which high degree nodes are prefer- 
entially connected to high degree nodes. This preferen- 
tial connectivity is crucial to the reduction in robustness 
to malicious attacks: for no other structural feature in- 
vestigated was this reduction seen. These modules are 
connected preferentially to the largest module with few 
connections, as we have remarked occurs in biological 
networks. 

Thus the first, simpler, network of Fig. 4 demonstrates 
that the presence of modularity is sufficient to qualitative 
account for most of the curious behaviors of PPI network 
robustness in response to both random and malicious at- 
tacks. 

Back to Fig's 2 and 3, they show that for surrogates 
with large numbers of switches of connections, the ro- 
bustness of PPI networks to malicious attacks actually 
decreases for all organisms studied. This behavior can 
also be reproduced in model networks provided, crucially, 
that connections are preferentially included between high 
degree nodes. In this case, two competing effects arise. 



The randomization of the modules increases robustness, 
while the vanishing of the preferential connections de- 
creases the robustness. In case of random attack the 
second effect is negligible, but for malicious attacks, it 
leads to the surprising decrease in robustness that we 
have noted. 

In conclusion, we have demonstrated that biological 
protein networks are unexpectedly fragile against either 
random or malicious attacks. This fragility is measurable 
by comparison with surrogates with identical network 
statistics. We find that these behaviors are characteristic 
of modular networks, in which particular products or 
processes inhabit isolated modules. As anticipated 
earlier in this letter, we have confirmed (1) that this 
modular structure causes biological protein networks to 
be more vulnerable to targeted than random attacks, 
and (2) that through evolution these networks have be- 
come more robust than non-biological networks against 
malicious attacks. Nevertheless, as we have shown, pro- 
tein networks are more fragile than extensively rewired 
surrogates to random attacks, while being less fragile 
than the same surrogates to malicious attacks. We 
find that this final phenomenon is associated with the 
apparently unique tendency of high degree nodes in PPI 
networks to preferentially connect to other high degree 
nodes. We speculate that this preferential connectivity 
may have practical advantages, for example in pro- 
viding redundant pathways to permit key processes to 
function after a malicious attack or genetic deletion [4~6j . 



METHODS 

To test these predictions, we compare known pro- 
tein networks with surrogates that are as statistically 
similar as possible. To this end, we generate random- 
ized surrogate networks having the same size and de- 
gree distribution as true biological networks. To cre- 
ate such surrogates, we perform a sequence of randomly 
chosen switches of connections between pairs of nodes 
(k,£)} — > {{i,t), (k,j)} in a network, so that each 
node preserves its number of neighbors [11] . The random- 
izing algorithm is repeated Tm times, where Tm ranges 
from to 10 8 . For the organisms we study, Tm = 10 8 
ensures that each edge has been swapped more than 10 3 
times, effectively destroying any initial correlation in the 
network. We evaluate correlations between nodes by cal- 
culating nearest neighbor average connectivity |1 2j 

k nn (k) = J2k'P(k'\k) ) (1) 

k' 

where P(k'\k) is the conditional probability that a node 
with degree k is connected with one of degree k' . Indeed 
the created surrogates are uncorrelated. 
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Given a network and its surrogates, we evaluate the "ro- 
bustness" (denned shortly) of the network to random 
or targeted, malicious, attacks. For biological networks, 
random attacks (RA)[T3] take into account single gene 
changes due to radiation or mutagen exposure and er- 
rors in transcription. By contrast, malicious attacks de- 
scribe situations in which pathogens or toxins interfere 
with high degree hubs of the network. Such an attack 
is termed a "high degree based adaptive attack (HDA)" 
in the literature [14 16 . To define the robustness of a 
network against either random or malicious attack, we 
evaluate the sum of the fractions of the largest connected 
cluster while removing all nodes, 



R = 



N 



1 N 



(2) 



where N is the number of nodes in the network and s(q) 
is the fraction of nodes in the largest connected cluster 
after removing q nodes. This measure has the advantage 
over other, e.g. percolation [T5], metrics of robustness in 
that it can distinguish between different networks with 
similar "percolation thresholds", at which a significant 
number of elements of a network form a single cluster 
|17j . The normalization j^rj in Eq. (|2j ensures that 
the robustness is comparable for different network sizes, 
and the value of R lies between j^rj and 0.5. The lower 
limit on R corresponds to entirely isolated nodes, and 
R = 0.5 defines a network where all unattacked nodes 
remain in a single cluster. 
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